Skip to content

Conversation

@adityamaru
Copy link
Contributor

history.db currently lacks the automatic corruption recovery that cache.db has, causing BuildKit to fail on startup if history.db is corrupted. This is inconsistent since both databases are disposable (losing history is inconvenient but not fatal).

This commit:

  • Extracts the safe database opening logic to util/db/boltutil/SafeOpen
  • Updates cache.db to use the shared SafeOpen function
  • Applies the same recovery mechanism to history.db

The recovery mechanism backs up corrupted databases and creates fresh ones, allowing BuildKit to start successfully even after abrupt shutdowns or snapshot-related corruption (common with NoSync + network block devices like Ceph RBD).

Fixes startup failures when history.db is corrupted, matching the resilience already present for cache.db since commit ccc06b7.

history.db currently lacks the automatic corruption recovery that
cache.db has, causing BuildKit to fail on startup if history.db is
corrupted. This is inconsistent since both databases are disposable
(losing history is inconvenient but not fatal).

This commit:
- Extracts the safe database opening logic to util/db/boltutil/SafeOpen
- Updates cache.db to use the shared SafeOpen function
- Applies the same recovery mechanism to history.db

The recovery mechanism backs up corrupted databases and creates fresh
ones, allowing BuildKit to start successfully even after abrupt
shutdowns or snapshot-related corruption (common with NoSync + network
block devices like Ceph RBD).

Fixes startup failures when history.db is corrupted, matching the
resilience already present for cache.db since commit ccc06b7.

Signed-off-by: Aditya Maru <[email protected]>
Signed-off-by: Claude <[email protected]>
Copy link
Member

@crazy-max crazy-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazy-max crazy-max requested a review from tonistiigi November 21, 2025 09:09
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we had this protection for cache db is that cache db uses NoSync. But History DB does not. Is the corruption reproducible in some way? Looks like it would be boltdb bug if it happens here.

@jsternberg
Copy link
Collaborator

@tonistiigi it's feasibly possible with things that cause external sync issues like the mentioned Ceph RBD. It's not something a vanilla version of buildkit run on a normal filesystem would likely encounter, but it likely doesn't cause any harm to include.

@tonistiigi tonistiigi merged commit af4612c into moby:master Nov 25, 2025
166 of 167 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants